Install deliberr package

# install.packages("gumbelino/deliberr")
library(deliberr)
## Warning: replacing previous import 'ggplot2::alpha' by 'psych::alpha' when
## loading 'deliberr'
lsf.str("package:deliberr")
## get_dri : function (ic, adjusted = TRUE)  
## get_dri_alpha : function (data)  
## get_dri_ic : function (data)  
## get_dri_ind : function (ic)  
## permute_dri : function (data, iterations = 10000, verbose = FALSE, summary = TRUE)  
## plot_dri_ic : function (ic, title = NA, suffix = NA, dri = NA)  
## summarize_perm_dri : function (perms, type = "common")

Overview of data for analysis of LLM roles

Large-Language Models (LLMs) Preview

LLMs
Provider Model
1 anthropic claude-3-5-sonnet-20241022
2 anthropic claude-3-7-sonnet-20250219
3 google gemini-2.5-flash
4 xai grok-3-beta

Building on our previous analysis, we selected only top models.

Cases

Deliberative Cases
case survey N topic subtopic
1 UBC Bio biobanking_mayo_ubc 17 genomics genomics
2 CCPS ACT Deliberative ccps 31 climate climate
3 CSIRO WA energy_futures 17 climate energy
4 FNQCJ fnqcj 11 climate transportation
5 Forest Lay Citizen forestera 9 climate forest
6 Fremantle fremantle 41 transportation transportation
7 Activate uppsala_speaks 26 immigration immigration
8 Standard uppsala_speaks 22 immigration immigration
9 Winterthur zh_winterthur 16 climate climate

Also building on our previous analysis, we selected only deliberative cases.

Surveys

Surveys
survey considerations policies scale_max q_method
1 biobanking_mayo_ubc 38 7 11 FALSE
2 ccps 33 7 11 FALSE
3 energy_futures 45 9 11 FALSE
4 fnqcj 42 5 12 FALSE
5 forestera 45 7 11 FALSE
6 fremantle 36 6 11 TRUE
7 uppsala_speaks 42 7 7 FALSE
8 zh_winterthur 30 6 7 FALSE

Note that two of the cases share the same survey.

Roles (System Prompts)

Number of Prompts by Type
type n
devils 1
ideology 10
perspective 10
System Prompts
uid type role description
1 csk devils climate skeptic prioritizes economic growth over CO2 emission cuts, fossil fuels over renewable energy, and does not believe in climate science
2 ana ideology anarchist rejects all coercive authority and hierarchical government, advocating stateless, voluntary societies
3 con ideology conservative seeks to preserve traditional institutions, customs, and values, favoring order and gradual change
4 eco ideology ecologist focuses on environmental protection and sustainability, advocating for societal change to ecological limits
5 fas ideology fascist promotes extreme nationalism, authoritarianism, militarism, and a totalitarian state
6 fem ideology feminist advocates for gender equality, challenging patriarchal structures and discrimination against women
7 fun ideology fundamentalist adheres strictly to core beliefs, often religious, applying these principles to all life aspects
8 lib ideology liberal advocates individual liberty, rights, limited government, and free markets, emphasizing individual autonomy
9 nat ideology nationalist prioritizes the interests and identity of a particular nation, often seeking self-determination
10 pop ideology populist appeals directly to “the people” against a perceived corrupt elite using anti-establishment rhetoric
11 soc ideology socialist aims for social ownership or control of production, emphasizing equality and collective welfare
12 coa perspective coastal resident endures chronic flooding and salinization, forced to relocate due to rising sea levels and intense storms worsened by climate change
13 ctr perspective construction worker suffers from extreme heat stress and lost work hours, perceiving climate change making outdoor labor unbearable and life-threatening
14 dis perspective disease survivor recovers from dengue fever, aware that climate change’s rising temperatures are expanding the range of disease-carrying mosquitoes in their region
15 eld perspective elderly urban resident endures intensified city heatwaves, struggling with disrupted services and feeling the direct, severe impact of climate change
16 far perspective displaced family loses their home due to unprecedented wildfires, experiencing displacement and recognizing climate change as the major driver of the devastation
17 fis perspective fisher notes his declining catches due to warming oceans, understanding that climate change is reorganizing marine life and reducing their traditional yield
18 lan perspective landowner surveys his parched fields after a prolonged drought, feeling the compounding impacts of climate change that reduce crop yields and family income
19 par perspective parent sees their child fall ill from a water-borne disease, attributing its spread to the increased heavy rainfall and warmer temperatures brought by climate change
20 sub perspective subsistence farmer watches his crops wither under erratic rainfall patterns, and who sees these changes as direct consequence of climate change
21 vil perspective villager faces dwindling, contaminated water supplies due to extended draughts and floods, aware that climate change is altering their water security

Summary of LLM Data Collection

We collected a total of 3361 LLM responses from 4 models across 8 surveys and 21 roles. We prompted each LLM 5 times with the same prompt.

Climate Analysis

Subset of cases used in the climate analysis
case survey N topic subtopic
1 CCPS ACT Deliberative ccps 31 climate climate
2 CSIRO WA energy_futures 17 climate energy
3 Winterthur zh_winterthur 16 climate climate
Subset of roles used in the climate analysis
uid type article role description
1 eco ideology an ecologist focuses on environmental protection and sustainability, advocating for societal change to ecological limits
2 coa perspective a coastal resident endures chronic flooding and salinization, forced to relocate due to rising sea levels and intense storms worsened by climate change
3 ctr perspective a construction worker suffers from extreme heat stress and lost work hours, perceiving climate change making outdoor labor unbearable and life-threatening
4 dis perspective a disease survivor recovers from dengue fever, aware that climate change’s rising temperatures are expanding the range of disease-carrying mosquitoes in their region
5 eld perspective an elderly urban resident endures intensified city heatwaves, struggling with disrupted services and feeling the direct, severe impact of climate change
6 far perspective a displaced family loses their home due to unprecedented wildfires, experiencing displacement and recognizing climate change as the major driver of the devastation
7 fis perspective a fisher notes his declining catches due to warming oceans, understanding that climate change is reorganizing marine life and reducing their traditional yield
8 lan perspective a landowner surveys his parched fields after a prolonged drought, feeling the compounding impacts of climate change that reduce crop yields and family income
9 par perspective a parent sees their child fall ill from a water-borne disease, attributing its spread to the increased heavy rainfall and warmer temperatures brought by climate change
10 sub perspective a subsistence farmer watches his crops wither under erratic rainfall patterns, and who sees these changes as direct consequence of climate change
11 vil perspective a villager faces dwindling, contaminated water supplies due to extended draughts and floods, aware that climate change is altering their water security
12 csk devils a climate skeptic prioritizes economic growth over CO2 emission cuts, fossil fuels over renewable energy, and does not believe in climate science

For the climate analysis, we selected a subset of 721 responses generated by 4 models cross 3 surveys and 12 roles described above. We prompted each LLM 5 times with the same prompt.

We calculated one DRI value per model/survey/role by treating each LLM response as one participant in a deliberation. The role “all” indicates that all roles were part of that deliberation (n = 60 participants, which equals 5 participants for each of the 12 roles). See example below.

Consistency results

Head (5) of DRI consistency cross climate roles
model survey role dri alpha_c alpha_p alpha_all n
claude-3-5-sonnet-20241022 ccps all 0.417 0.991 0.623 0.988 60
claude-3-5-sonnet-20241022 ccps coa 0.437 0.792 0.590 0.836 5
claude-3-5-sonnet-20241022 ccps csk 0.158 0.768 0.778 0.746 5
claude-3-5-sonnet-20241022 ccps ctr 0.380 0.942 0.740 0.934 5
claude-3-5-sonnet-20241022 ccps dis 0.468 0.866 0.733 0.868 5

Consistency data (DRI and Cronbach’s alpha)
role variable n min max median iqr mean sd se ci
1 csk dri 12 0.158 0.922 0.799 0.055 0.746 0.203 0.059 0.129
2 eco dri 12 0.326 0.954 0.720 0.400 0.658 0.230 0.066 0.146
3 par dri 12 0.220 0.901 0.623 0.328 0.619 0.241 0.070 0.153
4 far dri 12 0.224 0.937 0.608 0.268 0.612 0.201 0.058 0.128
5 sub dri 12 -0.029 0.875 0.744 0.396 0.612 0.277 0.080 0.176
6 all dri 12 0.417 0.759 0.637 0.161 0.604 0.113 0.033 0.072
7 fis dri 12 0.325 0.899 0.606 0.180 0.594 0.166 0.048 0.105
8 eld dri 12 0.246 0.848 0.638 0.453 0.581 0.229 0.066 0.146
9 coa dri 12 0.167 0.906 0.570 0.282 0.573 0.239 0.069 0.152
10 lan dri 12 0.138 0.830 0.578 0.177 0.573 0.191 0.055 0.122
11 vil dri 12 -0.119 0.878 0.594 0.049 0.551 0.247 0.071 0.157
12 dis dri 12 0.237 0.942 0.494 0.086 0.510 0.171 0.049 0.109
13 ctr dri 12 -0.023 0.840 0.549 0.341 0.505 0.259 0.075 0.165
14 all alpha_c 12 0.981 0.991 0.989 0.003 0.988 0.003 0.001 0.002
15 eld alpha_c 12 0.848 0.962 0.909 0.041 0.909 0.031 0.009 0.020
16 lan alpha_c 12 0.833 0.961 0.920 0.051 0.908 0.039 0.011 0.025
17 fis alpha_c 12 0.851 0.941 0.904 0.039 0.903 0.027 0.008 0.017
18 ctr alpha_c 12 0.782 0.951 0.899 0.031 0.896 0.043 0.012 0.027
19 sub alpha_c 12 0.843 0.936 0.907 0.051 0.896 0.031 0.009 0.020
20 dis alpha_c 12 0.803 0.949 0.910 0.048 0.895 0.043 0.012 0.027
21 par alpha_c 12 0.722 0.941 0.908 0.030 0.895 0.058 0.017 0.037
22 coa alpha_c 12 0.750 0.945 0.883 0.060 0.880 0.059 0.017 0.037
23 vil alpha_c 12 0.817 0.928 0.895 0.067 0.880 0.041 0.012 0.026
24 far alpha_c 12 0.702 0.942 0.872 0.040 0.868 0.070 0.020 0.045
25 eco alpha_c 12 0.793 0.950 0.862 0.025 0.866 0.041 0.012 0.026
26 csk alpha_c 12 0.189 0.915 0.826 0.161 0.752 0.206 0.059 0.131
27 csk alpha_p 12 0.718 0.917 0.839 0.178 0.821 0.082 0.024 0.052
28 sub alpha_p 12 0.682 0.882 0.844 0.088 0.810 0.070 0.020 0.044
29 ctr alpha_p 12 0.673 0.949 0.832 0.148 0.803 0.100 0.029 0.063
30 far alpha_p 12 0.632 0.909 0.800 0.104 0.796 0.085 0.024 0.054
31 all alpha_p 12 0.623 0.867 0.784 0.047 0.784 0.064 0.018 0.041
32 eco alpha_p 12 0.529 0.940 0.760 0.124 0.784 0.118 0.034 0.075
33 fis alpha_p 12 0.567 0.884 0.792 0.067 0.778 0.082 0.024 0.052
34 par alpha_p 12 0.598 0.882 0.802 0.115 0.778 0.084 0.024 0.053
35 dis alpha_p 12 0.681 0.894 0.768 0.075 0.775 0.068 0.020 0.043
36 lan alpha_p 12 0.658 0.868 0.789 0.073 0.772 0.063 0.018 0.040
37 eld alpha_p 12 0.647 0.871 0.794 0.128 0.771 0.077 0.022 0.049
38 coa alpha_p 12 0.590 0.889 0.775 0.066 0.759 0.085 0.025 0.054
39 vil alpha_p 12 0.367 0.892 0.798 0.054 0.749 0.153 0.044 0.097
40 all alpha_all 12 0.982 0.989 0.988 0.001 0.987 0.002 0.001 0.002
41 fis alpha_all 12 0.882 0.942 0.925 0.020 0.918 0.019 0.005 0.012
42 lan alpha_all 12 0.844 0.964 0.920 0.034 0.914 0.035 0.010 0.022
43 sub alpha_all 12 0.843 0.949 0.921 0.036 0.910 0.029 0.008 0.019
44 ctr alpha_all 12 0.839 0.944 0.914 0.036 0.909 0.029 0.008 0.019
45 eld alpha_all 12 0.854 0.961 0.901 0.042 0.909 0.032 0.009 0.021
46 par alpha_all 12 0.812 0.942 0.916 0.035 0.904 0.039 0.011 0.025
47 dis alpha_all 12 0.868 0.923 0.904 0.031 0.900 0.019 0.005 0.012
48 vil alpha_all 12 0.802 0.932 0.912 0.049 0.896 0.038 0.011 0.024
49 coa alpha_all 12 0.805 0.941 0.905 0.051 0.890 0.043 0.012 0.027
50 far alpha_all 12 0.694 0.947 0.903 0.041 0.884 0.067 0.019 0.043
51 eco alpha_all 12 0.843 0.933 0.875 0.040 0.882 0.027 0.008 0.017
52 csk alpha_all 12 0.718 0.947 0.852 0.078 0.854 0.070 0.020 0.045

Note that each role has 12 data points: 4 surveys x 3 models.

We found that LLMs are consistent across roles both in terms of DRI and Cronbach’s Alpha (policies). The high DRI across roles (median = 0.637; IQR = 0.161) suggests that LLMs tend to consistenly align their considerations and policy preferences. The high Cronbach’s alpha for their policy preferences (median = 0.784; IQR = 0.047) suggests that LLMs tend to agree on the ranking of their policy preferences.

Summary for model

Mean DRI across models and roles
role claude-3-5-sonnet-20241022 claude-3-7-sonnet-20250219 gemini-2.5-flash grok-3-beta best
1 all 0.512 0.639 0.638 0.625 claude-3-7-sonnet-20250219
2 coa 0.350 0.565 0.810 0.567 gemini-2.5-flash
3 csk 0.543 0.773 0.875 0.795 gemini-2.5-flash
4 ctr 0.343 0.567 0.663 0.447 gemini-2.5-flash
5 dis 0.476 0.538 0.569 0.455 gemini-2.5-flash
6 eco 0.364 0.720 0.854 0.696 gemini-2.5-flash
7 eld 0.404 0.498 0.796 0.626 gemini-2.5-flash
8 far 0.479 0.651 0.821 0.497 gemini-2.5-flash
9 fis 0.497 0.593 0.685 0.602 gemini-2.5-flash
10 lan 0.595 0.633 0.477 0.587 claude-3-7-sonnet-20250219
11 par 0.498 0.708 0.598 0.670 claude-3-7-sonnet-20250219
12 sub 0.526 0.712 0.556 0.654 claude-3-7-sonnet-20250219
13 vil 0.581 0.604 0.407 0.613 grok-3-beta

Summary Cronbach’s Alpha (Policies)

Mean alpha (policies) across models and roles
role claude-3-5-sonnet-20241022 claude-3-7-sonnet-20250219 gemini-2.5-flash grok-3-beta best
1 all 0.725 0.792 0.801 0.818 grok-3-beta
2 coa 0.713 0.745 0.771 0.807 grok-3-beta
3 csk 0.783 0.802 0.848 0.851 grok-3-beta
4 ctr 0.749 0.791 0.918 0.755 gemini-2.5-flash
5 dis 0.761 0.772 0.771 0.796 grok-3-beta
6 eco 0.764 0.844 0.814 0.716 claude-3-7-sonnet-20250219
7 eld 0.722 0.793 0.741 0.828 grok-3-beta
8 far 0.726 0.807 0.827 0.824 gemini-2.5-flash
9 fis 0.787 0.792 0.829 0.704 gemini-2.5-flash
10 lan 0.715 0.792 0.789 0.792 claude-3-7-sonnet-20250219
11 par 0.785 0.704 0.790 0.833 grok-3-beta
12 sub 0.841 0.800 0.761 0.839 claude-3-5-sonnet-20241022
13 vil 0.708 0.818 0.808 0.662 claude-3-7-sonnet-20250219

Summary Cronbach’s Alpha (Consideration)

Mean alpha (considerations) across models and roles
role claude-3-5-sonnet-20241022 claude-3-7-sonnet-20250219 gemini-2.5-flash grok-3-beta best
1 all 0.990 0.990 0.984 0.987 claude-3-5-sonnet-20241022
2 coa 0.863 0.918 0.849 0.891 claude-3-7-sonnet-20250219
3 csk 0.769 0.856 0.551 0.831 claude-3-7-sonnet-20250219
4 ctr 0.916 0.909 0.852 0.906 claude-3-5-sonnet-20241022
5 dis 0.905 0.921 0.859 0.896 claude-3-7-sonnet-20250219
6 eco 0.900 0.860 0.842 0.863 claude-3-5-sonnet-20241022
7 eld 0.917 0.899 0.917 0.903 claude-3-5-sonnet-20241022
8 far 0.905 0.848 0.815 0.905 claude-3-5-sonnet-20241022
9 fis 0.916 0.895 0.896 0.905 claude-3-5-sonnet-20241022
10 lan 0.917 0.914 0.884 0.917 claude-3-5-sonnet-20241022
11 par 0.925 0.905 0.830 0.922 claude-3-5-sonnet-20241022
12 sub 0.902 0.919 0.851 0.911 claude-3-7-sonnet-20250219
13 vil 0.881 0.880 0.873 0.887 grok-3-beta

Detailed data

DRI consistency cross 12 climate roles
model survey role dri alpha_c alpha_p alpha_all n
1 claude-3-5-sonnet-20241022 ccps all 0.417 0.991 0.623 0.988 60
2 claude-3-5-sonnet-20241022 ccps coa 0.437 0.792 0.590 0.836 5
3 claude-3-5-sonnet-20241022 ccps csk 0.158 0.768 0.778 0.746 5
4 claude-3-5-sonnet-20241022 ccps ctr 0.380 0.942 0.740 0.934 5
5 claude-3-5-sonnet-20241022 ccps dis 0.468 0.866 0.733 0.868 5
6 claude-3-5-sonnet-20241022 ccps eco 0.340 0.863 0.757 0.898 5
7 claude-3-5-sonnet-20241022 ccps eld 0.322 0.909 0.673 0.901 5
8 claude-3-5-sonnet-20241022 ccps far 0.434 0.901 0.632 0.916 5
9 claude-3-5-sonnet-20241022 ccps fis 0.424 0.941 0.776 0.928 5
10 claude-3-5-sonnet-20241022 ccps lan 0.457 0.933 0.689 0.923 5
11 claude-3-5-sonnet-20241022 ccps par 0.520 0.915 0.728 0.896 5
12 claude-3-5-sonnet-20241022 ccps sub -0.029 0.870 0.798 0.883 5
13 claude-3-5-sonnet-20241022 ccps vil 0.600 0.866 0.791 0.802 5
14 claude-3-5-sonnet-20241022 energy_futures all 0.497 0.989 0.772 0.988 60
15 claude-3-5-sonnet-20241022 energy_futures coa 0.167 0.881 0.771 0.903 5
16 claude-3-5-sonnet-20241022 energy_futures csk 0.869 0.915 0.726 0.919 5
17 claude-3-5-sonnet-20241022 energy_futures ctr -0.023 0.896 0.685 0.885 5
18 claude-3-5-sonnet-20241022 energy_futures dis 0.477 0.922 0.763 0.917 5
19 claude-3-5-sonnet-20241022 energy_futures eco 0.326 0.950 0.679 0.933 5
20 claude-3-5-sonnet-20241022 energy_futures eld 0.246 0.909 0.693 0.929 5
21 claude-3-5-sonnet-20241022 energy_futures far 0.553 0.942 0.767 0.947 5
22 claude-3-5-sonnet-20241022 energy_futures fis 0.436 0.915 0.786 0.935 5
23 claude-3-5-sonnet-20241022 energy_futures lan 0.645 0.951 0.658 0.952 5
24 claude-3-5-sonnet-20241022 energy_futures par 0.535 0.919 0.776 0.939 5
25 claude-3-5-sonnet-20241022 energy_futures sub 0.846 0.922 0.882 0.921 5
26 claude-3-5-sonnet-20241022 energy_futures vil 0.558 0.928 0.517 0.931 5
27 claude-3-5-sonnet-20241022 zh_winterthur all 0.624 0.989 0.780 0.988 60
28 claude-3-5-sonnet-20241022 zh_winterthur coa 0.447 0.916 0.778 0.845 5
29 claude-3-5-sonnet-20241022 zh_winterthur csk 0.601 0.623 0.845 0.820 5
30 claude-3-5-sonnet-20241022 zh_winterthur ctr 0.672 0.912 0.822 0.901 5
31 claude-3-5-sonnet-20241022 zh_winterthur dis 0.484 0.927 0.786 0.914 5
32 claude-3-5-sonnet-20241022 zh_winterthur eco 0.425 0.887 0.855 0.859 5
33 claude-3-5-sonnet-20241022 zh_winterthur eld 0.645 0.933 0.799 0.892 5
34 claude-3-5-sonnet-20241022 zh_winterthur far 0.449 0.870 0.778 0.839 5
35 claude-3-5-sonnet-20241022 zh_winterthur fis 0.631 0.893 0.799 0.900 5
36 claude-3-5-sonnet-20241022 zh_winterthur lan 0.683 0.868 0.799 0.867 5
37 claude-3-5-sonnet-20241022 zh_winterthur par 0.440 0.941 0.850 0.910 5
38 claude-3-5-sonnet-20241022 zh_winterthur sub 0.761 0.913 0.844 0.901 5
39 claude-3-5-sonnet-20241022 zh_winterthur vil 0.584 0.847 0.816 0.865 5
40 claude-3-7-sonnet-20250219 ccps all 0.676 0.990 0.775 0.989 60
41 claude-3-7-sonnet-20250219 ccps coa 0.683 0.874 0.717 0.908 5
42 claude-3-7-sonnet-20250219 ccps csk 0.719 0.813 0.855 0.863 5
43 claude-3-7-sonnet-20250219 ccps ctr 0.769 0.951 0.682 0.944 5
44 claude-3-7-sonnet-20250219 ccps dis 0.544 0.927 0.732 0.916 5
45 claude-3-7-sonnet-20250219 ccps eco 0.867 0.862 0.887 0.873 5
46 claude-3-7-sonnet-20250219 ccps eld 0.576 0.890 0.732 0.899 5
47 claude-3-7-sonnet-20250219 ccps far 0.785 0.759 0.682 0.853 5
48 claude-3-7-sonnet-20250219 ccps fis 0.582 0.899 0.819 0.888 5
49 claude-3-7-sonnet-20250219 ccps lan 0.523 0.917 0.764 0.902 5
50 claude-3-7-sonnet-20250219 ccps par 0.770 0.902 0.682 0.921 5
51 claude-3-7-sonnet-20250219 ccps sub 0.780 0.920 0.682 0.923 5
52 claude-3-7-sonnet-20250219 ccps vil 0.585 0.817 0.798 0.872 5
53 claude-3-7-sonnet-20250219 energy_futures all 0.591 0.988 0.814 0.988 60
54 claude-3-7-sonnet-20250219 energy_futures coa 0.560 0.935 0.741 0.941 5
55 claude-3-7-sonnet-20250219 energy_futures csk 0.801 0.915 0.833 0.947 5
56 claude-3-7-sonnet-20250219 energy_futures ctr 0.420 0.902 0.842 0.929 5
57 claude-3-7-sonnet-20250219 energy_futures dis 0.568 0.911 0.689 0.901 5
58 claude-3-7-sonnet-20250219 energy_futures eco 0.774 0.859 0.706 0.901 5
59 claude-3-7-sonnet-20250219 energy_futures eld 0.288 0.930 0.789 0.942 5
60 claude-3-7-sonnet-20250219 energy_futures far 0.663 0.917 0.889 0.928 5
61 claude-3-7-sonnet-20250219 energy_futures fis 0.563 0.935 0.758 0.942 5
62 claude-3-7-sonnet-20250219 energy_futures lan 0.546 0.863 0.797 0.893 5
63 claude-3-7-sonnet-20250219 energy_futures par 0.813 0.924 0.598 0.921 5
64 claude-3-7-sonnet-20250219 energy_futures sub 0.791 0.936 0.849 0.949 5
65 claude-3-7-sonnet-20250219 energy_futures vil 0.622 0.910 0.798 0.924 5
66 claude-3-7-sonnet-20250219 zh_winterthur all 0.649 0.991 0.787 0.989 60
67 claude-3-7-sonnet-20250219 zh_winterthur coa 0.452 0.945 0.778 0.916 5
68 claude-3-7-sonnet-20250219 zh_winterthur csk 0.797 0.839 0.718 0.841 5
69 claude-3-7-sonnet-20250219 zh_winterthur ctr 0.512 0.874 0.848 0.894 5
70 claude-3-7-sonnet-20250219 zh_winterthur dis 0.504 0.924 0.894 0.880 5
71 claude-3-7-sonnet-20250219 zh_winterthur eco 0.517 0.860 0.939 0.877 5
72 claude-3-7-sonnet-20250219 zh_winterthur eld 0.630 0.875 0.857 0.854 5
73 claude-3-7-sonnet-20250219 zh_winterthur far 0.506 0.866 0.848 0.884 5
74 claude-3-7-sonnet-20250219 zh_winterthur fis 0.633 0.851 0.800 0.910 5
75 claude-3-7-sonnet-20250219 zh_winterthur lan 0.830 0.961 0.816 0.964 5
76 claude-3-7-sonnet-20250219 zh_winterthur par 0.543 0.888 0.833 0.812 5
77 claude-3-7-sonnet-20250219 zh_winterthur sub 0.564 0.902 0.870 0.929 5
78 claude-3-7-sonnet-20250219 zh_winterthur vil 0.606 0.912 0.857 0.914 5
79 gemini-2.5-flash ccps all 0.711 0.982 0.765 0.982 60
80 gemini-2.5-flash ccps coa 0.854 0.750 0.889 0.805 5
81 gemini-2.5-flash ccps csk 0.895 0.606 0.722 0.718 5
82 gemini-2.5-flash ccps ctr 0.784 0.782 0.948 0.839 5
83 gemini-2.5-flash ccps dis 0.942 0.880 0.831 0.889 5
84 gemini-2.5-flash ccps eco 0.826 0.841 0.940 0.872 5
85 gemini-2.5-flash ccps eld 0.848 0.848 0.647 0.876 5
86 gemini-2.5-flash ccps far 0.937 0.702 0.750 0.694 5
87 gemini-2.5-flash ccps fis 0.899 0.874 0.750 0.882 5
88 gemini-2.5-flash ccps lan 0.688 0.833 0.781 0.844 5
89 gemini-2.5-flash ccps par 0.703 0.869 0.706 0.883 5
90 gemini-2.5-flash ccps sub 0.875 0.861 0.844 0.891 5
91 gemini-2.5-flash ccps vil 0.753 0.882 0.725 0.912 5
92 gemini-2.5-flash energy_futures all 0.527 0.981 0.825 0.982 60
93 gemini-2.5-flash energy_futures coa 0.906 0.928 0.625 0.941 5
94 gemini-2.5-flash energy_futures csk 0.809 0.859 0.907 0.902 5
95 gemini-2.5-flash energy_futures ctr 0.620 0.893 0.857 0.915 5
96 gemini-2.5-flash energy_futures dis 0.313 0.895 0.681 0.907 5
97 gemini-2.5-flash energy_futures eco 0.853 0.866 0.753 0.893 5
98 gemini-2.5-flash energy_futures eld 0.771 0.939 0.871 0.954 5
99 gemini-2.5-flash energy_futures far 0.827 0.870 0.821 0.901 5
100 gemini-2.5-flash energy_futures fis 0.486 0.930 0.884 0.923 5
101 gemini-2.5-flash energy_futures lan 0.138 0.934 0.759 0.944 5
102 gemini-2.5-flash energy_futures par 0.220 0.900 0.823 0.910 5
103 gemini-2.5-flash energy_futures sub 0.375 0.850 0.733 0.893 5
104 gemini-2.5-flash energy_futures vil -0.119 0.910 0.892 0.932 5
105 gemini-2.5-flash zh_winterthur all 0.677 0.989 0.814 0.988 60
106 gemini-2.5-flash zh_winterthur coa 0.671 0.868 0.800 0.868 5
107 gemini-2.5-flash zh_winterthur csk 0.922 0.189 0.917 0.830 5
108 gemini-2.5-flash zh_winterthur ctr 0.586 0.880 0.949 0.912 5
109 gemini-2.5-flash zh_winterthur dis 0.451 0.803 0.800 0.885 5
110 gemini-2.5-flash zh_winterthur eco 0.882 0.819 0.750 0.843 5
111 gemini-2.5-flash zh_winterthur eld 0.769 0.962 0.704 0.961 5
112 gemini-2.5-flash zh_winterthur far 0.700 0.871 0.909 0.906 5
113 gemini-2.5-flash zh_winterthur fis 0.671 0.885 0.853 0.925 5
114 gemini-2.5-flash zh_winterthur lan 0.605 0.886 0.825 0.931 5
115 gemini-2.5-flash zh_winterthur par 0.872 0.722 0.840 0.851 5
116 gemini-2.5-flash zh_winterthur sub 0.419 0.843 0.705 0.843 5
117 gemini-2.5-flash zh_winterthur vil 0.588 0.825 0.806 0.863 5
118 grok-3-beta ccps all 0.427 0.990 0.731 0.987 60
119 grok-3-beta ccps coa 0.245 0.862 0.856 0.910 5
120 grok-3-beta ccps csk 0.786 0.900 0.917 0.921 5
121 grok-3-beta ccps ctr 0.223 0.924 0.870 0.940 5
122 grok-3-beta ccps dis 0.237 0.909 0.882 0.923 5
123 grok-3-beta ccps eco 0.666 0.877 0.855 0.859 5
124 grok-3-beta ccps eld 0.304 0.885 0.828 0.883 5
125 grok-3-beta ccps far 0.224 0.872 0.882 0.919 5
126 grok-3-beta ccps fis 0.325 0.885 0.837 0.926 5
127 grok-3-beta ccps lan 0.384 0.923 0.815 0.916 5
128 grok-3-beta ccps par 0.232 0.899 0.882 0.923 5
129 grok-3-beta ccps sub 0.378 0.918 0.853 0.937 5
130 grok-3-beta ccps vil 0.323 0.921 0.837 0.912 5
131 grok-3-beta energy_futures all 0.691 0.985 0.867 0.986 60
132 grok-3-beta energy_futures coa 0.878 0.885 0.760 0.912 5
133 grok-3-beta energy_futures csk 0.797 0.856 0.908 0.900 5
134 grok-3-beta energy_futures ctr 0.278 0.908 0.673 0.922 5
135 grok-3-beta energy_futures dis 0.514 0.949 0.773 0.923 5
136 grok-3-beta energy_futures eco 0.954 0.919 0.529 0.919 5
137 grok-3-beta energy_futures eld 0.805 0.908 0.821 0.902 5
138 grok-3-beta energy_futures far 0.515 0.935 0.754 0.917 5
139 grok-3-beta energy_futures fis 0.834 0.921 0.708 0.928 5
140 grok-3-beta energy_futures lan 0.552 0.927 0.692 0.926 5
141 grok-3-beta energy_futures par 0.901 0.937 0.836 0.942 5
142 grok-3-beta energy_futures sub 0.857 0.910 0.782 0.930 5
143 grok-3-beta energy_futures vil 0.640 0.907 0.367 0.918 5
144 grok-3-beta zh_winterthur all 0.759 0.988 0.855 0.987 60
145 grok-3-beta zh_winterthur coa 0.580 0.926 0.806 0.896 5
146 grok-3-beta zh_winterthur csk 0.801 0.738 0.729 0.835 5
147 grok-3-beta zh_winterthur ctr 0.840 0.885 0.721 0.894 5
148 grok-3-beta zh_winterthur dis 0.614 0.831 0.733 0.883 5
149 grok-3-beta zh_winterthur eco 0.467 0.793 0.763 0.857 5
150 grok-3-beta zh_winterthur eld 0.771 0.914 0.835 0.917 5
151 grok-3-beta zh_winterthur far 0.752 0.907 0.835 0.899 5
152 grok-3-beta zh_winterthur fis 0.647 0.908 0.567 0.925 5
153 grok-3-beta zh_winterthur lan 0.825 0.901 0.868 0.907 5
154 grok-3-beta zh_winterthur par 0.877 0.929 0.781 0.941 5
155 grok-3-beta zh_winterthur sub 0.726 0.904 0.881 0.920 5
156 grok-3-beta zh_winterthur vil 0.878 0.833 0.781 0.910 5

Model/Survey DRI Plots

Survey/Role DRI Plots

Permutation tests

Surveys and Roles: Are models trully consistent across roles?

In this first permutation test, we explore the likelihood that the consistency, measured by DRI, is due to chance.

## Warning: Using `bins = 30` by default. Pick better value with the argument
## `bins`.

Number of significant (p < 0.05) roles across the 3 surveys.
role sig
csk 0
eld 1
fis 1
lan 1
sub 1
coa 2
ctr 2
dis 2
eco 2
far 2
par 2
vil 2
Number of significant (p < 0.05) surveys across the 12 roles
survey sig
ccps 1
zh_winterthur 8
energy_futures 9
Survey/Role Permutation Summary
obs_dri p n min max median iqr mean sd se ci survey role
0.682 0.000 10000 0.581 0.673 0.618 0.021 0.619 0.015 0 0 energy_futures sub
0.522 0.000 10000 0.509 0.521 0.514 0.003 0.514 0.002 0 0 zh_winterthur eco
0.534 0.000 10000 0.518 0.534 0.526 0.003 0.526 0.002 0 0 zh_winterthur par
0.415 0.000 10000 0.309 0.415 0.352 0.025 0.354 0.018 0 0 energy_futures eld
0.631 0.000 10000 0.590 0.632 0.607 0.008 0.607 0.006 0 0 energy_futures par
0.509 0.001 10000 0.480 0.511 0.493 0.007 0.494 0.005 0 0 energy_futures coa
0.487 0.002 10000 0.475 0.488 0.482 0.003 0.482 0.002 0 0 zh_winterthur coa
0.566 0.002 10000 0.556 0.568 0.562 0.002 0.562 0.001 0 0 zh_winterthur far
0.604 0.006 10000 0.571 0.612 0.589 0.008 0.589 0.006 0 0 zh_winterthur ctr
0.369 0.006 10000 0.294 0.378 0.337 0.021 0.337 0.014 0 0 energy_futures lan
0.500 0.007 10000 0.482 0.503 0.492 0.004 0.492 0.003 0 0 zh_winterthur dis
0.343 0.011 10000 0.283 0.353 0.316 0.016 0.316 0.011 0 0 energy_futures vil
0.603 0.012 10000 0.587 0.606 0.597 0.005 0.597 0.003 0 0 ccps eco
0.527 0.012 10000 0.509 0.533 0.519 0.005 0.519 0.004 0 0 zh_winterthur fis
0.382 0.018 10000 0.343 0.391 0.367 0.010 0.367 0.007 0 0 energy_futures ctr
0.590 0.024 10000 0.571 0.595 0.583 0.005 0.583 0.003 0 0 zh_winterthur vil
0.564 0.025 10000 0.542 0.570 0.555 0.007 0.555 0.005 0 0 energy_futures far
0.408 0.032 10000 0.383 0.414 0.399 0.007 0.399 0.005 0 0 energy_futures dis
0.813 0.057 10000 0.742 0.847 0.784 0.022 0.785 0.016 0 0 energy_futures csk
0.434 0.062 10000 0.396 0.449 0.423 0.010 0.423 0.007 0 0 energy_futures fis
0.725 0.078 10000 0.703 0.737 0.717 0.007 0.717 0.005 0 0 energy_futures eco
0.548 0.121 10000 0.539 0.555 0.545 0.003 0.545 0.002 0 0 ccps par
0.553 0.163 10000 0.534 0.565 0.549 0.006 0.549 0.005 0 0 ccps lan
0.527 0.183 10000 0.511 0.536 0.523 0.005 0.523 0.004 0 0 ccps eld
0.602 0.215 10000 0.577 0.623 0.597 0.009 0.597 0.007 0 0 ccps vil
0.517 0.238 10000 0.512 0.521 0.516 0.002 0.516 0.001 0 0 ccps dis
0.447 0.264 10000 0.437 0.454 0.445 0.004 0.445 0.003 0 0 ccps fis
0.582 0.303 10000 0.564 0.598 0.577 0.010 0.579 0.006 0 0 zh_winterthur sub
0.446 0.344 10000 0.418 0.467 0.443 0.010 0.443 0.007 0 0 ccps sub
0.639 0.395 10000 0.612 0.674 0.635 0.018 0.638 0.011 0 0 zh_winterthur eld
0.571 0.449 10000 0.564 0.580 0.571 0.003 0.571 0.002 0 0 ccps far
0.651 0.480 10000 0.629 0.675 0.650 0.010 0.651 0.007 0 0 ccps csk
0.723 0.486 10000 0.698 0.757 0.722 0.011 0.723 0.008 0 0 zh_winterthur lan
0.768 0.653 10000 0.753 0.793 0.771 0.009 0.771 0.006 0 0 zh_winterthur csk
0.562 0.922 10000 0.556 0.581 0.567 0.005 0.567 0.004 0 0 ccps coa
0.543 0.938 10000 0.538 0.561 0.548 0.004 0.548 0.003 0 0 ccps ctr

Models and Surveys: Which models are consistent across roles?

## Warning: Using `bins = 30` by default. Pick better value with the argument
## `bins`.

Survey/Model Permutation Summary
obs_dri p n min max median iqr mean sd se ci survey model
0.417 0 10000 -0.296 0.224 -0.236 0.111 -0.202 0.074 0.001 0.001 ccps claude-3-5-sonnet-20241022
0.676 0 10000 -0.162 0.375 -0.117 0.136 -0.072 0.085 0.001 0.002 ccps claude-3-7-sonnet-20250219
0.427 0 10000 -0.300 0.259 -0.257 0.116 -0.219 0.073 0.001 0.001 ccps grok-3-beta
0.711 0 10000 -0.110 0.404 -0.061 0.129 -0.018 0.081 0.001 0.002 ccps gemini-2.5-flash
0.497 0 10000 -0.303 0.211 -0.227 0.117 -0.193 0.077 0.001 0.002 energy_futures claude-3-5-sonnet-20241022
0.591 0 10000 -0.212 0.280 -0.154 0.124 -0.118 0.080 0.001 0.002 energy_futures claude-3-7-sonnet-20250219
0.691 0 10000 -0.144 0.528 -0.088 0.129 -0.051 0.082 0.001 0.002 energy_futures grok-3-beta
0.527 0 10000 -0.245 0.246 -0.150 0.113 -0.131 0.079 0.001 0.002 energy_futures gemini-2.5-flash
0.624 0 10000 -0.160 0.442 -0.119 0.125 -0.074 0.080 0.001 0.002 zh_winterthur claude-3-5-sonnet-20241022
0.649 0 10000 -0.132 0.343 -0.081 0.124 -0.040 0.078 0.001 0.002 zh_winterthur claude-3-7-sonnet-20250219
0.759 0 10000 -0.030 0.437 0.007 0.126 0.050 0.078 0.001 0.002 zh_winterthur grok-3-beta
0.677 0 10000 -0.186 0.457 -0.137 0.130 -0.093 0.084 0.001 0.002 zh_winterthur gemini-2.5-flash

All models seem to be consistent across roles. None of the 10,000 permutations led to a higher DRI than the observed DRI, suggesting that the observed value is likely not due to chance.

References